{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Structural Topic Model\n",
    "\n",
    "This R notebook reproduces the Structural Topic Model used in Step 1 of the Computational Grounded Theory project.\n",
    "\n",
    "Note: This notebook produces the model and then saves it. Producing the model can take quite a bit of time to run, upwards of four hours. To explore the topic models produced skip directly to the next notebook, `02-TopicExploration.ipynb`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Requirements and Dependencies\n",
    "\n",
    "Model created using R 3.4.0 \n",
    "\n",
    "Main libary: stm_1.2.2  \n",
    "\n",
    "Dependencies:\n",
    "* tm_0.7-1\n",
    "* NLP_0.1-10\n",
    "* SnowballC_0.5.1\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "stm v1.1.3 (2016-01-14) successfully loaded. See ?stm for help.\n"
     ]
    }
   ],
   "source": [
    "library(stm)\n",
    "\n",
    "### Load Data\n",
    "df <- read.csv('../data/comparativewomensmovement_dataset.csv', sep='\\t')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "##Pre-Processing\n",
    "\n",
    "temp<-textProcessor(documents=df$text_string,metadata=df)\n",
    "meta<-temp$meta\n",
    "vocab<-temp$vocab\n",
    "docs<-temp$documents\n",
    "out <- prepDocuments(docs, vocab, meta)\n",
    "docs<-out$documents\n",
    "vocab<-out$vocab\n",
    "meta <-out$meta"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "##Produce Models\n",
    "\n",
    "### Model search across numbers of topics\n",
    "\n",
    "storage <- manyTopics(docs,vocab,K=c(20,30,40,50), prevalence=~org, data=meta, seed = 1234)\n",
    "\n",
    "mod.20 <- storage$out[[1]]\n",
    "mod.30 <- storage$out[[2]] \n",
    "mod.40 <- storage$out[[3]] \n",
    "mod.50 <- storage$out[[4]] "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "##Save Full Model, with four different topic models saved\n",
    "save.image(\"../data/stm_all.RData\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.3.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
